Automatic medieval charters structure detection : A Bi-LSTM linear segmentation approach

نویسندگان

چکیده

This paper presents a model aiming to automatically detect sections in medieval Latin charters. These legal sources are some of the most important for studies as they reflect economic and social dynamics well institutional writing practices. An automatic linear segmentation can greatly facilitate charter indexation speed up recovering evidence support historical hypothesis by means granular inquiries on these raw, rarely structured sources. Our is based Bi-LSTM approach using final CRF-layer was trained large, annotated collection charters (4,700 documents) coming from Lombard monasteries: CDLM corpus (11th-12th centuries). The evaluation shows high performance test-set an external consisting Montecassino abbey (10th-12th We describe architecture model, main problems related treatment formulaic discourse, we discuss implications results terms record-keeping practices High Middle Ages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Writer Identification in Medieval Papal Charters

Automatic writer identification and writer verification has recently received significant attention in the field of historical analysis. In this work a short overview of current approaches for writer identification is given. Current state-of-the-art results on contemporary data are related to different approaches for writer verification on a small dataset of datum lines extracted from papal cha...

متن کامل

Dating medieval English charters

Deeds, or charters, dealing with property rights, provide a continuous documentation which can be used by historians to study the evolution of social, economic and political changes. This study is concerned with charters (written in Latin) dating from the tenth through early fourteenth centuries in England. Of these, at least one million were left undated, largely due to administrative changes ...

متن کامل

Challenges in Annotating Medieval Latin Charters

No annotation guidelines concerning substandard Latin are presently available. This paper describes an annotation style of substandard Latin that supplements the method designed for standard Latin by the Perseus Latin Dependency Treebank and the Index Thomisticus Treebank. Each word of the corpus can be assigned only one morphological analysis. In our system, the analysis can be either function...

متن کامل

Bi-directional LSTM Recurrent Neural Network for Chinese Word Segmentation

Recurrent neural network(RNN) has been broadly applied to natural language processing(NLP) problems. This kind of neural network is designed for modeling sequential data and has been testified to be quite efficient in sequential tagging tasks. In this paper, we propose to use bi-directional RNN with long short-term memory(LSTM) units for Chinese word segmentation, which is a crucial preprocess ...

متن کامل

Arabic Multi-Dialect Segmentation: bi-LSTM-CRF vs. SVM

Arabic word segmentation is essential for a variety of NLP applications such as machine translation and information retrieval. Segmentation entails breaking words into their constituent stems, affixes and clitics. In this paper, we compare two approaches for segmenting four major Arabic dialects using only several thousand training examples for each dialect. The two approaches involve posing th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Data Mining and Digital Humanities

سال: 2022

ISSN: ['2416-5999']

DOI: https://doi.org/10.46298/jdmdh.8646